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To all whom it may concern: 
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improvements in: 



Method and System for Computer-Assisted Test 
Construction Performing Specification Matching 
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METHOD AND SYSTEM FOR COMPUTER-ASSISTED TEST CONSTRUCTION 
PERFORMING SPECIFICATION MATCHING DURING TEST ITEM SELECTION 

FIELD OF THE INVENTION 
[0001] The present invention generally relates to the field of test construction. The 
present invention particularly relates to a method and system for constructing a test using a 
computer system. Specifically, the present invention relates to a method and system for 
constructing a test using a computer system that performs specification matching during the test 
creation process. 

BACKGROUND OF THE INVENTION 
[0002] Testing services administer a variety of standardized tests. For example, the 
Graduate Management Admission Test® (GMAT®) evaluates graduate business school 
applicants by measuring general verbal, mathematical, and analytical writing skills. The 
Graduate Record Examinations® (GRE®) assists graduate schools and departments in graduate 
admissions activities. Tests offered include the General Test, which measures developed verbal, 
quantitative, and analytical abilities, and the Subject Tests, which measure achievement in 14 
different fields of study. The Scholastic Assessment Test® (SAT®) Program includes the SAT 
I: Reasoning Test and SAT II: Subject Tests. The SAT I is a three-hour test, primarily multiple- 
choice, that measures verbal and mathematical reasoning abilities. The SAT II: Subject Tests are 
one-hour, mostly multiple-choice, tests in specific subjects. These tests measure knowledge of 
particular subjects and the ability to apply that knowledge. Colleges and universities typically 
use the SAT® Program as a factor in determining admission or placement of prospective 
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students. Individual states also administer tests to determine whether and to what extent students 
meet state standards for educational achievement. 

[0003] Many tests, such as the above-mentioned tests, are offered multiple times during 
a year and/or are administered over multiple years. It is important, in the case of tests that are 
offered multiple times during a year, that the different administrations of each test be 
approximately equal in difficulty in order to properly rate examinees from different testing dates 
against one another. For tests that are administered over multiple years, it is important that each 
test be of a known difficulty level to accurately assess an examinee's performance and progress. 
Moreover, it may be important to evaluate other psychometric specifications and statistical 
properties for a given test prior to its administration. 

[0004] Some current methods for constructing tests, including those using a computer 
interface, permit a test developer to view and select test items. Other methods can display a 
match between content specifications and the content properties of the selected test items. For 
example, such test construction systems typically keep track of metrics such as the number of 
questions that test a particular subject. On the SAT I, for example, questions are divided into 
mathematics and verbal questions. Additionally, the test construction system could also keep 
track of the number of questions that are devoted to a sub-topic (such as geometry or algebra) or 
that are presented in a certain format (such as an analogy completion, sentence completion or 
word problem). By identifying the number of questions of a particular type included in the 
developed test, the test developer may be alerted if an incorrect number of questions or an 
incorrect number of questions of a particular type are included in the test. 

[0005] However, systems implementing these methods do not combine all of the 
features listed above to permit the test developer to develop tests more quickly, while at the same 
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time including the ability to determine if the selected test items meet psychometric specifications 
for a test and also permitting a test developer to examine content or psychometric specifications 
during the test development process so that the test developer can add, remove or replace test 
items to adjust for deficiencies with respect to test specifications during the test item selection 
process. 

[0006] Thus, a need exists for an evaluation tool that determines whether defined 
content and psychometric specifications for a test are met by a particular question set. 

[0007] A further need exists for providing psychometric and statistical information to a 
test creator during the test creation process to permit evaluation and adjustment of the selected 
test items during the test creation process. 

SUMMARY OF PREFERRED EMBODIMENTS 
[0008] Before the present methods, systems, and materials are described, it is to be 
understood that this invention is not limited to the particular methodologies, systems and 
materials described, as these may vary. It is also to be understood that the terminology used in 
the description is for the purpose of describing the particular versions or embodiments only, and 
is not intended to limit the scope of the present invention which will be limited only by the 
appended claims. 

[0009] It must also be noted that as used herein and in the appended claims, the singular 
forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. 
Thus, for example, reference to a "test item" is a reference to one or more test items and 
equivalents thereof known to those skilled in the art, and so forth. Unless defined otherwise, all 
technical and scientific terms used herein have the same meanings as commonly understood by 
one of ordinary skill in the art. Although any methods, materials, and devices similar or 
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equivalent to those described herein can be used in the practice or testing of embodiments of the 
present invention, the preferred methods, materials, and devices are now described. All 
publications mentioned herein are incorporated by reference. Nothing herein is to be construed 
as an admission that the invention is not entitled to antedate such disclosure by virtue of prior 
invention. 

[0010] The present invention includes a database of test items and information 
regarding test items and tests. The information may include the content structure and statistical 
properties of test items and the content and psychometric specifications for tests to be 
constructed from the database. Psychometric specifications include specifications related to the 
measurement of human characteristics. Psychometric specifications may be used to develop 
tests in areas such as intelligence testing, personality testing and vocational testing. 

[0011] The present invention may further include a systematic procedure for selecting 
items for a test using a computer system connected to the database described above. As the test 
is being constructed, the extent to which the developing test matches the content and 
psychometric specifications for the test may be displayed so that a developer may adjust the set 
of items selected to best match those specifications. 

[0012] In a preferred embodiment, a method of constructing a test includes selecting a 
test item for inclusion in a set of selected test items, updating at least one evaluation statistic 
based on the selected test item, and revising the set of selected test items to substantially 
correlate the at least one evaluation statistic with at least one specification for a test. The test 
item may be selected at least in part based on a subject matter for the test item. The at least one 
evaluation statistic may be selected from content specifications and psychometric specifications. 
The at least one specification may also be selected from content specifications and psychometric 
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specifications. The content specifications may include a number of test items to be presented in 
each of one or more pre-determined formats, a total number of test items to be included in the set 
of selected test items, a number of test items for testing each of one or more pre-determined 
subject matters, a key distribution, a percentage of test items having one or more pre-defined 
characteristics, a gender or racial orientation of test items, and a language in which the test items 
are presented. The psychometric specifications may include an overall test difficulty rating, a 
correlation between a correct response for a selected test item and a particular cognitive or 
behavioral trait, an orientation of the presentation of questions and answers for the set of selected 
test items, a number of pages of text for a test, a mean point-biserial, a mean r-biserial, and an 
arrangement of the set of selected test items. 

[0013] In a preferred embodiment, a method for constructing a test includes selecting a 
portion of a test item database from which to select a set of test items for a test having one or 
more test specifications, displaying information concerning a plurality of test items in the 
selected portion of the test item database, examining a test item on a display device, selecting the 
test item for the test, and updating a value for at least one test specification based on specified 
properties for the selected test item. Selecting a portion of a test item database may be based on 
the subject matter of the test items contained within the portion of the test item database. 
Examining a test item may include viewing an image of the test item, statistical properties of the 
test item, text passages associated with the test item, an answer key, detailed content 
specifications, reviewers 1 comments, scoring guidelines, and artwork associated with the test 
item. The statistical properties may include one or more of a percentage of correct responses for 
the test item, t-biserials, r-biserials, item response theory parameters, gender-based response 
statistics, race-based response statistics, a percentage of responses choosing each distractor, and 
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a frequency of previous usage for the test item. Updating a value for at least one test 
specification may be performed using item response theory. In an embodiment, the method 
further includes comparing current values for the one or more test specifications with required 
values for the one or more test specifications. In an embodiment, the method further includes 
replacing one or more test items in the set of selected test items based on the one or more 
updated specifications. In an embodiment, the method further includes adding one or more test 
items to the set of selected test items based on the one or more updated specifications. In an 
embodiment, the method further includes removing one or more test items from the set of 
selected test items based on the one or more updated specifications. 

[0014] In a preferred embodiment, a system for constructing a test includes a processor, 
a computer-readable medium operably connected to the processor, and a display. The computer- 
readable medium contains one or more databases each having a plurality of test items. Each test 
item includes a textual question and one or more answers for the test item, a content structure of 
the test item, and one or more statistical properties for the test item. The statistical properties 
may include a percentage of correct responses for the test item, t-biserials, r-biserials, item 
response theory parameters, gender-based response statistics, race-based response statistics, a 
percentage of responses choosing each distractor, and a frequency of previous usage for the test 
item. The computer-readable medium may further include content specifications for a test, and 
psychometric specifications for the test. In an embodiment, the processor evaluates the content 
specifications and psychometric specifications for a test while the test is being created and 
determines a correlation value between the properties of the plurality of test items for the test and 
the content specifications and psychometric specifications for the test. The display displays the 
correlation value to the test developer. In an embodiment, the computer-readable medium 
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further contains instructions for performing a method of constructing a test including selecting a 
test item for inclusion in a set of selected test items, updating at least one evaluation statistic 
based on the selected test item, and revising the set of selected test items to substantially 
correlate the at least one evaluation statistic with at least one specification for a test. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0015] The accompanying drawings, which are incorporated in and form a part of the 
specification, illustrate preferred embodiments of the present invention and, together with the 
description serve to explain the principles of the invention. The embodiments illustrated in the 
drawings should not be read to constitute limiting requirements, but instead are intended to assist 
the reader in understanding the invention. 

[0016] FIG. 1 depicts an exemplary process flow for creating a test according to an 
embodiment of the present invention. 

[0017] FIG. 2 depicts an exemplary system for creating a test according to an 
embodiment of the present invention. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 
[0018] The present invention relates to a method and system for constructing a test 
using a computer system. Specifically, the present invention relates to a method and system for 
constructing a test using a computer system that performs specification matching during the test 
creation process. 

[0019] FIG. 1 depicts an exemplary process flow for creating a test according to an 
embodiment of the present invention. First, the test developer may determine 105 the portion of 
the test item database or a particular test item database from which to select a test item for the 
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test. The test item database may be a single repository containing all selectable test items for one 
or more tests. Alternatively, a test developer may choose questions from a plurality of test item 
databases. The test items contained within a portion of the test item database or within a 
particular test item database may possess distinguishing characteristics. For example, a 
particular test item database may contain test items pertaining only to questions for testing 
knowledge of geometric principles. The characteristics used to distinguish test items within a 
portion of the test item database or a particular test item database from other test items or test 
item databases may correspond to psychometric specifications or content specifications for a test. 
When the test item database or databases are organized in this manner, the test developer may 
more quickly modify the set of selected test items to match specifications that are not satisfied 
during test construction. 

[0020] The test developer may then view 110 a spreadsheet displaying information 
about the test items within the portion of the test item database or the particular test item 
database. A computer system may be used to display the spreadsheet. The test developer may 
examine 115 an individual test item on a display device of the computer system prior to adding 
the test item to the test under development. Examining 115 an individual test item may include 
viewing the test item image, information about the test item and information about related 
entities such as text passages or artwork associated with the test item. Test item information may 
include, for example, an answer key, detailed content specifications, reviewers' comments, 
scoring guidelines (for constructed-response test items), statistical properties for the test item, 
such as a percentage of correct answers received if the test item was previously administered, t- 
biserials, r-biserials, item response theory parameters, gender-based response statistics, race- 
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based response statistics, a percentage of responses choosing each distractor, and the frequency 
with which the question or a similar variant has been included on previous test administrations. 

[0021] Upon reviewing the information, the test developer may select 120 individual 
test items for inclusion in the test. As the developer selects test items, the resulting correlation 
between content and psychometric specifications for the test and the corresponding 
characteristics of the user-selected test may be updated 125. If the selected test items do not 
meet one or more specifications, the developer may revise 130 his or her test item selection until 
a best matching between the test item properties and the content and psychometric specifications 
is achieved. In an exemplary embodiment, item response theory may be used in the matching of 
test item properties to test specifications. In an alternate embodiment, the percentage of 
examinees that answer a test item correctly may be used in the matching of test item properties to 
test specifications. These methods of determining test item properties are merely exemplary and 
are not meant to be limiting. Additional methods for determining test item properties may be 
performed singly or in combination with the above-listed methods and are intended to be 
encompassed within the scope of the present invention without limitation. 

[0022] FIG. 2 depicts an exemplary system for creating a test according to an 
embodiment of the present invention. The system may include a computer system 200 
containing a processor 205, a display 210 and a computer-readable medium 215, such as a hard 
drive, a floppy disk, a CD, a DVD, RAM, ROM, EPROM, EEPROM or other memory or 
memory storage device. The computer-readable medium 215 may contain a database 220 
including potential test items and information about test items and tests. 

[0023] The information about the test items may include content structure and statistical 
properties of the test items. The content structure may denote the format of the question, the 
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information being tested, the style of question, and similar content-related information. The 
statistical properties may include the percentage of examinees that select a particular response, 
the correlation between selecting a particular response and exhibiting a particular personality 
trait (in the case of behavioral or psychological testing), and the like. 

[0024] The information regarding the tests may include content specifications and 
psychometric specifications for tests to be constructed from the database 220. The content 
specifications may list requirements for the test such as requiring a certain percentage of test 
items to be in multiple-choice format, to test verbal skills, or to be of a specified length. Content 
specifications may also include, without limitation, specifying the overall test length, the number 
of test items presented on a particular topic, a key distribution, the percentage of test items with 
particular characteristics, a gender or racial orientation of items and the language in which the 
test is presented. Psychometric specifications may include, without limitation, a preferred 
overall test difficulty, a correlation between a correct response for a test item and a particular 
cognitive or behavioral trait, the orientation of the presentation of questions and answers for test 
items, mean point-biserials, mean r-biserials and the visual presentation of the testing materials. 

[0025] The above-listed specifications are merely representative of specifications and 
properties that may be included in the database 220. It will be evident to one of skill in the art 
that more or fewer properties and specifications may be included in the database and still be 
within the scope of the invention. 

[0026] The computer-readable medium 215 or a second computer-readable medium 
225 operably connected to the processor 205 may contain a computer program for implementing 
a systematic procedure for selecting items for a test. The program may display the extent to 
which a test under development matches the content and psychometric specifications as a user 
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constructs the test. In this way, the user may replace, remove, or add one or more test items to 
the set of selected test items to best match those specifications in an efficient manner. 

[0027] Although the invention has been described with reference to the preferred 
embodiments, it will be apparent to one skilled in the art that variations and modifications are 
contemplated within the spirit and scope of the invention. The drawings and description of the 
preferred embodiments are made by way of example rather than to limit the scope of the 
invention, and it is intended to cover within the spirit and scope of the invention all such changes 
and modifications. 
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